1 Statement of Authorship

This report is created by Ekene Olatunji, Catherine Nassralla, Victoria Chin, Bassam Chamas, and Sajiya Somji for the eHealth 705 Statistics in eHealth final project.

2 Executive Summary

Using the principles taught during the eHealth 705 course, the ICU Admissions dataset was explored, analyzed and a logistic regression model was created. Logistic regression was used to create a model as the response variable, Status, was categorical, as were several of the predictor variables in this dataset. Using graphs, descriptive statistics and summaries, the attributes were analyzed and recoded if needed, as well as interpreted. (include reasons why we did tests, summary of what was found. ) - overview of tests that we performed and some of the results - maybe include why we chose some of the tests - since the main predictor variable was vital status, most of the analyses and exploration was about this variable

3 About the ICU Admissions Dataset

The ICU Admissions Dataset was obtained

3.1 Description

The ICU Admission dataset contains observations of patients that were admitted to an adult intensive care unit (ICU). The response variable of this dataset is Status of patients after admission, that is, whether they lived or died after admission to ICU. A description of the variables is available below.

Number of cases: 200 Variable Names:

  1. ID: ID number of the patient
  2. STA: Vital status (0 = Lived, 1 = Died)
  3. AGE: Patient’s age in years
  4. SEX: Patient’s sex (0 = Male, 1 = Female)
  5. RACE: Patient’s race (1 = White, 2 = Black, 3 = Other)
  6. SER: Service at ICU admission (0 = Medical, 1 = Surgical)
  7. CAN: Is cancer part of the present problem? (0 = No, 1 = Yes)
  8. CRN: History of chronic renal failure (0 = No, 1 = Yes)
  9. INF: Infection probable at ICU admission (0 = No, 1 = Yes)
  10. CPR: CPR prior to ICU admission (0 = No, 1 = Yes)
  11. SYS: Systolic blood pressure at ICU admission (in mm Hg)
  12. HRA: Heart rate at ICU admission (beats/min)
  13. PRE: Previous admission to an ICU within 6 months (0 = No, 1 = Yes)
  14. TYP: Type of admission (0 = Elective, 1 = Emergency)
  15. FRA: Long bone, multiple, neck, single area, or hip fracture (0 = No, 1 = Yes)
  16. PO2: PO2 from initial blood gases (0 = >60, 1 = ²60)
  17. PH: PH from initial blood gases (0 = ³7.25, 1 <7.25)
  18. PCO: PCO2 from initial blood gases (0 = ²45, 1 = >45)
  19. BIC: Bicarbonate from initial blood gases (0 = ³18, 1 = <18)
  20. CRE: Creatinine from initial blood gases (0 = ²2.0, 1 = >2.0)
  21. LOC: Level of consciousness at admission (0 = no coma or stupor, 1= deep stupor, 2 = coma)

4 Reading in the ICU Admissions Dataset

> ICU <- read.table("./ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)

4.1 Structure of the ICU dataset

> str(ICU)
'data.frame':   200 obs. of  21 variables:
 $ ID           : int  8 12 14 28 32 38 40 41 42 50 ...
 $ Status       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Age          : int  27 59 77 54 87 69 63 30 35 70 ...
 $ Sex          : int  1 0 0 0 1 0 0 1 0 1 ...
 $ Race         : int  1 1 1 1 1 1 1 1 2 1 ...
 $ Service      : int  0 0 1 0 1 0 1 0 0 1 ...
 $ Cancer       : int  0 0 0 0 0 0 0 0 0 1 ...
 $ Renal        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Infection    : int  1 0 0 1 1 1 0 0 0 0 ...
 $ CPR          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Systolic     : int  142 112 100 142 110 110 104 144 108 138 ...
 $ HeartRate    : int  88 80 70 103 154 132 66 110 60 103 ...
 $ Previous     : int  0 1 0 0 1 0 0 0 0 0 ...
 $ Type         : int  1 1 0 1 1 1 0 1 1 0 ...
 $ Fracture     : int  0 0 0 1 0 0 0 0 0 0 ...
 $ PO2          : int  0 0 0 0 0 1 0 0 0 0 ...
 $ PH           : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PCO2         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Bicarbonate  : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Creatinine   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Consciousness: int  1 1 1 1 1 1 1 1 1 1 ...

The ICU Admissions dataset consists of 200 observations with 21 variables. From these observations we found;

  1. The dependent variable is the binary variable Vital Status (Status).
  2. Nineteen possible predictor variables, both discrete and continuous, were also observed.
  3. Most of the variables are integer but from information about the data, most of the variables can be recoded to factor variables. Several of the variables have two to three levels categories and so while they are numerical, their means do not offer any meaningful information.
  4. There are only four variables that can be left as numerical variables others can be recoded to categorical/factor variables.
  5. There would be a need to recode the categorical variables to factors.
  6. There are no missing data.

5 Converting Numerical Variables to Factor Variables

Attributes that could be recoded as factors included Status, Sex, Race, Service, Cancer, Renal, Infection, CPR, Previous, Type, Fracture, PCO2, PH, PO2, Bicarbonate, Creatinine and Consciousness. Labelling the factor levels with level names helps with comparative analysis and visualization.

> ICU <- within(ICU, {
+   Status <- factor(Status, labels=c('Lived','Died'))
+   Sex <- factor(Sex, labels=c('Male','Female'))
+   Race <- factor(Race, labels=c('White','Black','Other'))
+   Service <- factor(Service, labels=c('Medical','Surgical'))
+   Cancer <- factor(Cancer, labels=c('No','Yes'))
+   Renal <- factor(Renal, labels=c('No','Yes'))
+   Infection <- factor(Infection, labels=c('No','Yes'))
+   CPR <- factor(CPR, labels=c('No','Yes'))
+   Previous <- factor(Previous, labels=c('No','Yes'))
+   Type <- factor(Type, labels=c('Elective','Emergency'))
+   Fracture <- factor(Fracture, labels=c('No','Yes'))
+   PCO2 <- factor(PCO2, labels=c('No','Yes'))
+   PH <- factor(PH, labels=c('No','Yes'))
+   PO2 <- factor(PO2, labels=c('No','Yes'))
+   Bicarbonate <- factor(Bicarbonate, labels=c('No','Yes'))
+   Creatinine <- factor(Creatinine, labels=c('No','Yes'))
+   Consciousness <- factor(Consciousness, labels=c('Conscious','Deep Stupor','Coma'))
+ })

5.1 Preview of Recoded Dataset

> headTail(ICU) %>% datatable(rownames = TRUE, filter="top", options = list(pageLenght = 10, scrollX=T))%>% formatRound(columns=c(1:17), digits=0)

5.2 Saving Recoded Dataset

> # write.csv(ICU, file="ICUAdmissions_recoded.csv", row.names=FALSE)

6 Data Exploration - Statistical summaries and Graphing of variables

6.1 Viewing a few rows of the recorded dataset

> headTail(ICU) %>% datatable(rownames = TRUE, filter="top", options = list(pageLenght = 10, scrollX=T))%>% formatRound(columns=c(1:17), digits=0)
> str(ICU)
'data.frame':   200 obs. of  21 variables:
 $ ID           : int  8 12 14 28 32 38 40 41 42 50 ...
 $ Status       : Factor w/ 2 levels "Lived","Died": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age          : int  27 59 77 54 87 69 63 30 35 70 ...
 $ Sex          : Factor w/ 2 levels "Male","Female": 2 1 1 1 2 1 1 2 1 2 ...
 $ Race         : Factor w/ 3 levels "White","Black",..: 1 1 1 1 1 1 1 1 2 1 ...
 $ Service      : Factor w/ 2 levels "Medical","Surgical": 1 1 2 1 2 1 2 1 1 2 ...
 $ Cancer       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
 $ Renal        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Infection    : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 1 1 1 1 ...
 $ CPR          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Systolic     : int  142 112 100 142 110 110 104 144 108 138 ...
 $ HeartRate    : int  88 80 70 103 154 132 66 110 60 103 ...
 $ Previous     : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 1 1 1 1 ...
 $ Type         : Factor w/ 2 levels "Elective","Emergency": 2 2 1 2 2 2 1 2 2 1 ...
 $ Fracture     : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
 $ PO2          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ PH           : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ PCO2         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Bicarbonate  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ Creatinine   : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Consciousness: Factor w/ 3 levels "Conscious","Deep Stupor",..: 1 1 1 1 1 1 1 1 1 1 ...

Findings

  • The following variables were successfully recoded; Status, Sex, Race, Service, Cancer, Renal, Infection, CPR, Previous, Type,Fracture, PCO2,PH, PO2, Bicarbonate, Creatinine, and Consciousness.

6.2 Statistical Summary of the Recoded Dataset

> summary(ICU)
       ID          Status         Age            Sex         Race    
 Min.   :  4.0   Lived:160   Min.   :16.00   Male  :124   White:175  
 1st Qu.:210.2   Died : 40   1st Qu.:46.75   Female: 76   Black: 15  
 Median :412.5               Median :63.00                Other: 10  
 Mean   :444.8               Mean   :57.55                           
 3rd Qu.:671.8               3rd Qu.:72.00                           
 Max.   :929.0               Max.   :92.00                           
     Service    Cancer    Renal     Infection  CPR         Systolic    
 Medical : 93   No :180   No :181   No :116   No :187   Min.   : 36.0  
 Surgical:107   Yes: 20   Yes: 19   Yes: 84   Yes: 13   1st Qu.:110.0  
                                                        Median :130.0  
                                                        Mean   :132.3  
                                                        3rd Qu.:150.0  
                                                        Max.   :256.0  
   HeartRate      Previous         Type     Fracture   PO2        PH     
 Min.   : 39.00   No :170   Elective : 53   No :185   No :184   No :187  
 1st Qu.: 80.00   Yes: 30   Emergency:147   Yes: 15   Yes: 16   Yes: 13  
 Median : 96.00                                                          
 Mean   : 98.92                                                          
 3rd Qu.:118.25                                                          
 Max.   :192.00                                                          
  PCO2     Bicarbonate Creatinine     Consciousness
 No :180   No :185     No :190    Conscious  :185  
 Yes: 20   Yes: 15     Yes: 10    Deep Stupor:  5  
                                  Coma       : 10  
                                                   
                                                   
                                                   

6.3 Numerical Summary of Integer Variables

> numSummary(ICU[,c("Age", "HeartRate", "Systolic"), drop=FALSE], statistics=c("mean", "sd", "IQR", 
+   "quantiles"), quantiles=c(0,.25,.5,.75,1))
             mean       sd   IQR 0%    25% 50%    75% 100%   n
Age        57.545 20.05465 25.25 16  46.75  63  72.00   92 200
HeartRate  98.925 26.82962 38.25 39  80.00  96 118.25  192 200
Systolic  132.280 32.95210 40.00 36 110.00 130 150.00  256 200

Findings

  • Age ranges from 16-92 years old with a mean of 57.55 and median of 63
  • Systolic blood pressure ranges from 36-256 mmHg with a mean of 132.3 and median of 130
  • Heart rate ranges from 39-192 beats per minute with a mean of 98.92 and median of 96

7 Data Exploration - Graphs

7.1 Examining the Variables Distribution

================================================================================

   Variable   p_1 p_10   p_25  p_50   p_75  p_90   p_99
1  Systolic 55.92   92    110   130    150   170 212.12
2       Age 16.99   21  46.75    63     72    78     91
3 HeartRate 45.98   65     80    96 118.25 136.1 162.08
4        ID 11.96 81.3 210.25 412.5 671.75 829.8 924.01

Using the xray package, general trends in the dataset were identified.

  1. There are more male observations in the dataset than females.

  2. Many of the admissions to the ICU were emergencies, with a about a quarter of admissions being elective. This could relate to elective surgical procedures where morbidity could have been high or complications occurred. It is unclear whether people would be preemptively admitted to the ICU for high-risk procedures or if these elective admissions could be thought of as unforeseen or emergencies in themselves.

  3. Looking at Status, more than 75% of observations lived after their ICU admission.

  4. While the numbers are roughly even, slightly more procedures were surgical.

  5. More admitted patients had no chronic renal failure, previous admissions to the ICU, fracture, CPR or cancer when admitted. Most admitted patients had PO2 above or equal to 60, blood pH above or equal to 7.25, PCO2 below or equal to 45, and creatinine below or equal to 2.

  6. Most patients admitted were conscious, with slighly more patients being comatose than in a deep stupor if unconscious.

  7. Looking at histograms of the three numerical variables in the dataset, which were Age, Systolic and HeartRate, it could be guess that if any of the distributions were to be normal, they would be Systolic and HeartRate. Age is very obviously not normally distributed. Systolic looks like it is centered around 140, and the mean confirms this as mentioned above. The mean of HeartRate seems to be centered around 100 and its mean confirms this visual estimate.

7.2 Distribution of Vital Status by Factor Variables

> library(ggplot2)
> f01<-ggplot(ICU, aes(x=Sex, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Sex")
> 
> f02<-ggplot(ICU, aes(x=Race, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Race")
> f03<-ggplot(ICU, aes(x=Service, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Service")
> 
> f04<-ggplot(ICU, aes(x=Cancer, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Cancer")
> 
> library(Rmisc)
> multiplot(f01, f02, f03, f04,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f05<-ggplot(ICU, aes(x=Renal, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Renal")
> 
> f06<-ggplot(ICU, aes(x=Infection, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Infection")
> 
> f07<-ggplot(ICU, aes(x=CPR, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by CPR")
> 
> f08<-ggplot(ICU, aes(x=Previous, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Previous")
> 
> library(Rmisc)
> multiplot(f05, f06, f07, f08,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f09<-ggplot(ICU, aes(x=Type, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Type")
> 
> f10<-ggplot(ICU, aes(x=Fracture, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Fracture")
> 
> f11<-ggplot(ICU, aes(x=PO2, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PO2")
> 
> f12<-ggplot(ICU, aes(x=PH, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PH")
> 
> library(Rmisc)
> multiplot(f09, f10, f11, f12,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f13<-ggplot(ICU, aes(x=PCO2, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PCO2")
> 
> f14<-ggplot(ICU, aes(x=Bicarbonate, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Bicarbonate")
> 
> f15<-ggplot(ICU, aes(x=Creatinine, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Creatinine")
> 
> f16<-ggplot(ICU, aes(x=Consciousness, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Consciousness")
> 
> 
> library(Rmisc)
> multiplot(f13, f14, f15, f16, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

7.3 Plotting the density Distribution of the Numeric Variables

> d01 <- ggplot(ICU, aes(x=Age)) +
+  geom_density(fill="green") +
+  ggtitle("Age") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d02 <- ggplot(ICU, aes(x=Systolic)) +
+  geom_density(fill="green") +
+  ggtitle("Systolic") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d03 <- ggplot(ICU, aes(x=HeartRate)) +
+  geom_density(fill="green") +
+  ggtitle("HeartRate") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d04 <- ggplot(ICU, aes(x=ID)) +
+  geom_density(fill="green") +
+  ggtitle("ID") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> multiplot(d01, d02, d03, d04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

7.4 Density Plot of Vital Status by Numeric Variables

> n01<-ggplot(ICU, aes(x=Age, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Distribution of Vital Status by Age")
> 
> n02<-ggplot(ICU, aes(x=Systolic, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Distribution of Vital Status by Systolic")
> 
> n03<-ggplot(ICU, aes(x=HeartRate, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Distribution of Vital Status by HR")
> 
> n04<-ggplot(ICU, aes(x=Age, fill = Status)) +
+   theme_bw() +
+   facet_wrap(~ Sex) +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status in male and female patients by Age")
> 
> multiplot(n01, n02, n03, n04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

7.5 Observations

  • ***There are more male than female in the population.
  • The number of patient who live is higher than the population who died
  • More Patients from the population aged 50 and above died in both male and female
  • The female population experienced more death than men of the same age bracket as
  • ****The density plot is peaked at the top at the age 75 for both male and female meaning more patients aged 75 died.

8 Catherine

8.1 Relationships with Age

The mean of Age, referring to the descriptive statistics for this attributes, is about 58 years. Visually, it can be seen that the distribution of Age has two peaks, one around 20 years of age and one around 70 years of age. The majority of subjects admitted to the ICU seem to be between 40-80 years old.

> ggplot(ICU, aes(x=Age)) +
+  geom_density(fill="green") +
+  ggtitle("Age") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)

Overlaying Status and Age, density plots show that deaths after admission follow the larger Age peak that contains middle aged to elderly individuals.

> ggplot(ICU, aes(x=Age, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status by Age")

While the plot of Age and death follow the same pattern for older individuals, the association between Cancer and Age is less clear. There are three peaks for of Ages where ICU admissions involved cancer. These are around 20, 50 and 70 years of age. Given the relatively small number of individuals who had cancer involvement with their admission, and the variability in ages associated, cancer might not be the best predictor of ICU admission.

> ggplot(ICU, aes(x=Age, fill = Cancer)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Cancer by Age")

ICU admissions that involved infection centered around 60-65 years of age. This associations is relatively clear visually looking at the plot below.

> ggplot(ICU, aes(x=Age, fill = Infection )) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Infection by Age")

Another association that was explored was between Age and Consiousness. All three categories for Consciousness overlap with peaks between 50 and 75 years of age. There is a distinct peak at around 50 for deep stupor, indicating that this age might be associated with deep stupor when related to ICU admissions. However, deep stupor also has a secondary peak at around 75, which make it less clearly how this consciousness category relates to age in this dataset.

> ggplot(ICU, aes(x=Age, fill = Consciousness )) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Consciousness by Age")

Both admission Types, elective and emergency, happen more frequently with advancing age. However, elective admissions happen more frequently with older age. This might suggest that older individuals are more likely to have complications during surgical or other procedures that would require admissions to an intensive care unit.

> ggplot(ICU, aes(x=Age, fill = Type )) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Type by Age")

8.2 Relationships with HeartRate

Using a density estimate plot, the distribution of HeartRate was produced. The main peak here is at about 90 beats per minute. There appears to be secondary peak around 130 beats per minute. The mean for HeartRate calculated by Rcmdr is 99. Looking at the data, this mean might be said to not accurately portray the frequency of values for HearRate.

>  ggplot(ICU, aes(x=HeartRate)) +
+  geom_density(fill="green") +
+  ggtitle("HeartRate") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)

Both status categories have similar peaks when plotted on a density estimate plot of HeartRate. This indicates that there might not be a difference in the Status response category based on HeartRate. People who lived did seem to have a higher density of HeartRate values around 90 than those who died. A normal adult heart rate rangest between 60 and 100 beats per minute. This graph might suggest that people who lived more often had heartrates within this range, whereas those who died seemed more likely to have higher heartrates. This could inform further analysis or research.

> ggplot(ICU, aes(x=HeartRate, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status HeartRate")

8.3 Relationships with Systolic

Below is a density plot of Systolic blood pressure in mmHg. A peak appears around 130-140mmHg. The peak is quite narrow and distinct compared to the density plots of Age and HeartRate, suggesting that the majority of individuals had blood pressure readings that are reflective of this plot. The mean for this attribute was 132mmHg, which seems to be more valid than the means of the other numeric variables, based on the distribution of the frequency of values.

> ggplot(ICU, aes(x=Systolic)) +
+  geom_density(fill="green") +
+  ggtitle("Systolic") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)

The distribution of Systolic values for those who died seem to be concentrated around 75 mmHg and 140mmHg. A normal Systolic blood pressure can vary widely but the American Heart Association states that blood pressure below 120mmHg is normal. However, excessively low blood pressure could be a result of bleeding, for example, and can result in insufficient blood flow to critical organs. Medications used to restore blood pressure are used when blood pressure becomes too low. This might explain the peak around 75mmHg in the death curve in the graph below.

> ggplot(ICU, aes(x=Systolic, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status Systolic")

9 Uniform Distribution

The distribution of the ICU attributes will be explored below. Many of the attributes chosen to be tested here relate to the history of the individuals in the data set or the circumstances of the admission, rather than lab tests on admission. It would be interesting to explore Most of the categorical attributes in this dataset have two categories, making the degrees of freedom 1 for chi squared distribution.

Theoretical chi square distribution with 1 degrees of freedom

Theoretical chi square distribution with 1 degrees of freedom

For Categorical variables with two categories:
H0: the attribute is distributed uniformally, the chi squared value does not exceed the hypothesized value of 3.841 (p = 0.05)
Ha: the attribute is not distiributed uniformally, The chi squared value exceeds the hypothesized value of 3.841

Chi squared test - Status:

The chi squared test for Status returns a chi squared value above the null hypothesized value with a very small p value (<0.05), indicating that there is a very low risk of Type I error if the null hypothesis is rejected. It can be concluded that Status is not uniformally distributed and that statistically more individuals lived than died on admission to the ICU.

> local({
+   .Table <- with(ICU, table(Status))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Status
Lived  Died 
  160    40 

percentages:
Status
Lived  Died 
   80    20 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 72, df = 1, p-value < 2.2e-16

Chi squared test - CPR:

Similarly, the chi squared test for CPR returns a very high chi squared value at 151, and a low p value, allowing us to reject the null hypothesis that this attribute is uniformally distributed. More individuals did not have CPR than did before ICU admission. In fact, as seen from the table below, few individuals received CPR at all, which might make it a poor predictor of ICU outcomes.

> local({
+   .Table <- with(ICU, table(CPR))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
CPR
 No Yes 
187  13 

percentages:
CPR
  No  Yes 
93.5  6.5 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 151.38, df = 1, p-value < 2.2e-16
> table(ICU$CPR, ICU$Status)
     
      Lived Died
  No    154   33
  Yes     6    7

Chi squared test - Infection:

The chi squared value produced from this test is only slightly higher than the null hypothesized value, with a p value of 0.024, indicating a 2.4% chance of a Type I error if the null hypothesis is rejected. While the distribution can be concluded to not be uniform based on the threshold set with the null hypothesis, the uniformity of Infection among patients admitted to the ICU could be explored in further research. It should be noted that in the group of individuals who died, more had infections than not.

> local({
+   .Table <- with(ICU, table(Infection))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Infection
 No Yes 
116  84 

percentages:
Infection
 No Yes 
 58  42 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 5.12, df = 1, p-value = 0.02365
> table(ICU$Infection, ICU$Status)
     
      Lived Died
  No    100   16
  Yes    60   24

Chi squared test - Previous:

The chi squared value here exceeds the null hypothesized value with a p value much smaller than 0.05, allowing us to reject the null hypothesis that this attribute is uniformally distributed. More individuals in both status categories did not have a previous ICU admission.

> local({
+   .Table <- with(ICU, table(Previous))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Previous
 No Yes 
170  30 

percentages:
Previous
 No Yes 
 85  15 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 98, df = 1, p-value < 2.2e-16

Chi squared test - Sex:

The chi squared value returned from this test is 11.52, which is not as far from the null hypothesized value as some of the other values generated from other tests. The value and p value of 0.00069 still indicated that the null hypothesis should be reject and that Sex is not uniformally distributed. There are more men in this dataset than females.

> local({
+   .Table <- with(ICU, table(Sex))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Sex
  Male Female 
   124     76 

percentages:
Sex
  Male Female 
    62     38 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 11.52, df = 1, p-value = 0.0006885

In both sex categories, despite the inequality in the number of observations in each group, there are still more individuals who lived than died.

> with(ICU, Barplot(Sex, by=Status, 
+   style="divided", legend.pos="above", 
+   xlab="Sex", ylab="Frequency"))

Chi squared test - Type:

The distribution of Type can be concluded to be non-uniform as the null hypothesis should be rejected based on the chi squared value produced and low p value. As seen in exploratory data analysis, the there are more individuals who are had emergency admissions than elective admissions. This difference in distribution is statistically significant. Fewer individuals died when they were admitted to the ICU on an elective basis.

> local({
+   .Table <- with(ICU, table(Type))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.5,0.5) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Type
 Elective Emergency 
       53       147 

percentages:
Type
 Elective Emergency 
     26.5      73.5 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 44.18, df = 1, p-value = 2.995e-11
> with(ICU, Barplot(Type, by=Status, 
+   style="divided", legend.pos="above", 
+   xlab="Type", ylab="Frequency"))

For Categorical variables with three categories:
Ho: the attribute is distributed uniformally, the chi squared value does not exceed the hypothesized value of 5.992 (p = 0.05)
Ha: the attribute is not distiributed uniformally, The chi squared value exceeds the hypothesized value of 5.992 (p=0.05)

Theoretical chi square distribution with 2 degrees of freedom

Theoretical chi square distribution with 2 degrees of freedom

Chi squared test - Consciousness

The null hypothesis for this attribute’s distribution is that an equal number of observations will be found in each category of Consciousness. The chi squared value produced by this test exceeds the null hypothesized value with a low p value, indicating that the observations in this category are statistically not uniform. This could have been guessed from counts of observations in each category, where most individuals were in the conscious category.

> local({
+   .Table <- with(ICU, 
+   table(Consciousness))
+   cat("\ncounts:\n")
+   print(.Table)
+   cat("\npercentages:\n")
+   print(round(100*.Table/sum(.Table), 
+   2))
+   .Probs <- c(0.333333333333333,
+   0.333333333333333,0.333333333333333) 
+   chisq.test(.Table, p=.Probs)
+ })

counts:
Consciousness
  Conscious Deep Stupor        Coma 
        185           5          10 

percentages:
Consciousness
  Conscious Deep Stupor        Coma 
       92.5         2.5         5.0 

    Chi-squared test for given probabilities

data:  .Table
X-squared = 315.25, df = 2, p-value < 2.2e-16

9.1 Normal Distribution

Distribution of the three numerical variables in the ICU dataset was evaluated using quantile-quantile plots and the Shapiro-Wilk’s test for normality. Confidence intervals will be produced for the means of each attribute.

H0: Age, HeartRate and Systolic are normally distributed
Ha: Age, HeartRate and Systolic are not normally distributed

Tests of normality - Age:

Recall that a quantile-quantile plot should produce a nearly linear plot using dataset values, with the intercept going through zero, if the null hypothesis is satisfied. Below, the plotted points of Age do not conform well to the line of best fit, and are frequently outside of the confidence bands. This supports what might have been hypothesized when the histogram of Age was produced: this distribution is not clearly centered around a mean. This plot suggests that the null hypothesis that Age is normally distributed should be rejected.

> with(ICU, qqPlot(Age, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ plot of Age"))

[1] 23 97

Using the Shapiro-Wilk normality test, the conclusions drawn from the QQ plot can be supported, with a very low risk of Type I error, base on this p value. The Shapiro Wilk test is sensitive when used on larger datasets, so this result should be taken into account with the other tests used.

> normalityTest(~Age, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  Age
W = 0.92836, p-value = 2.507e-08

Using the t-test, given the confidence interval for this attribute is between 54.75 and 60.34, an interval that does not contain 0. (choosing not to add this as I do not have a test mean - need to refresh on what a true mean is )

Tests of normality - HeartRate:

The QQ plot of HeartRate appears to adhere well to the line of best fit, despite several points in the middle of the graph lying along one side of the confidence band. Most of the points here fall within the confidence band. There is a positive skew of this data towards the lower values of HeartRate. Visually, one might conclude that this data is normally distributed.

> with(ICU, qqPlot(HeartRate, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ plot of HeartRate"))

[1] 125  48

The results of the Shapiro-Wilk test very narrowly allow one to reject the null hypothesis of normal distribution. However the p value is very close to the threshold p value of 0.05, which decreases the confidence that might be had to reject normality based on this test and the sample size used.

> normalityTest(~HeartRate, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  HeartRate
W = 0.98598, p-value = 0.04478

Test of normality - Systolic:

Similar to the plot of HeartRate, the QQ plot of Systolic displays points that line closely on the line of best fit. The confidence bands here are quite narrow, and it is only along the ends oof the plot the points very clearly exit the confidence bands. One might conclude that the null hypothesis could be rejected for the normal distribution of Systolic based on this plot. There is a very slight negative skew here towards higher values of Systolic.

> with(ICU, qqPlot(Systolic, dist="norm", id=list(method="y", n=2, labels=rownames(ICU)), main="QQ Plot of Systolic"))

[1] 200 179

While more statistically significant compared to the Shapiro Wilk test for normality of the HeartRate distribution, the p value here is not far from 0.05, indicating a 2% risk of Type I error. Based on this test, the null hypothesis could be rejected. The distribution of Systolic appears to be non-normal.

> normalityTest(~Systolic, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  Systolic
W = 0.98369, p-value = 0.0204

In reference to Ian Fellows’ comments about tests for normality, these tests for normality do not add much more to understanding of the ICU dataset and the contributions of these three numerical variables to the determination of Status, beyond what is provided by the xray package. In the same vein, as the central limit theorem and bootstrapping have shown, most distributions can become normal with repeated sampling.

10 Victoria

11 Differences in Means

11.1 Summary Statistics

> numSummary(ICU[,c("Age", "Systolic", "HeartRate"), drop=FALSE], groups=ICU$Status, statistics=c("mean", "sd", "se(mean)"),quantiles=c(0,.25, .5, .75,1))

Variable: Age 
        mean       sd se(mean)   n
Lived 55.650 20.42818 1.614990 160
Died  65.125 16.64900 2.632438  40

Variable: Systolic 
          mean       sd se(mean)   n
Lived 135.6438 29.80151 2.356016 160
Died  118.8250 41.08084 6.495451  40

Variable: HeartRate 
         mean       sd se(mean)   n
Lived  98.500 26.97868 2.132852 160
Died  100.625 26.49304 4.188918  40
> library(psych)
> describeBy(ICU, ICU$Status)

 Descriptive statistics by group 
group: Lived
               vars   n   mean     sd median trimmed    mad min max range  skew
ID                1 160 457.04 276.35  438.5  454.25 346.19   8 929   921  0.07
Status*           2 160   1.00   0.00    1.0    1.00   0.00   1   1     0   NaN
Age               3 160  55.65  20.43   61.0   56.86  19.27  16  91    75 -0.55
Sex*              4 160   1.38   0.49    1.0    1.34   0.00   1   2     1  0.51
Race*             5 160   1.19   0.50    1.0    1.05   0.00   1   3     2  2.65
Service*          6 160   1.58   0.49    2.0    1.60   0.00   1   2     1 -0.33
Cancer*           7 160   1.10   0.30    1.0    1.00   0.00   1   2     1  2.64
Renal*            8 160   1.07   0.25    1.0    1.00   0.00   1   2     1  3.38
Infection*        9 160   1.38   0.49    1.0    1.34   0.00   1   2     1  0.51
CPR*             10 160   1.04   0.19    1.0    1.00   0.00   1   2     1  4.82
Systolic         11 160 135.64  29.80  132.0  133.97  29.65  48 224   176  0.42
HeartRate        12 160  98.50  26.98   95.0   97.48  25.20  39 192   153  0.44
Previous*        13 160   1.14   0.35    1.0    1.05   0.00   1   2     1  2.01
Type*            14 160   1.68   0.47    2.0    1.73   0.00   1   2     1 -0.77
Fracture*        15 160   1.07   0.26    1.0    1.00   0.00   1   2     1  3.20
PO2*             16 160   1.07   0.25    1.0    1.00   0.00   1   2     1  3.38
PH*              17 160   1.06   0.23    1.0    1.00   0.00   1   2     1  3.82
PCO2*            18 160   1.10   0.30    1.0    1.00   0.00   1   2     1  2.64
Bicarbonate*     19 160   1.06   0.24    1.0    1.00   0.00   1   2     1  3.58
Creatinine*      20 160   1.03   0.17    1.0    1.00   0.00   1   2     1  5.34
Consciousness*   21 160   1.02   0.22    1.0    1.00   0.00   1   3     2  8.69
               kurtosis    se
ID                -1.25 21.85
Status*             NaN  0.00
Age               -0.82  1.61
Sex*              -1.75  0.04
Race*              5.98  0.04
Service*          -1.91  0.04
Cancer*            5.01  0.02
Renal*             9.46  0.02
Infection*        -1.75  0.04
CPR*              21.40  0.02
Systolic           0.34  2.36
HeartRate          0.17  2.13
Previous*          2.06  0.03
Type*             -1.41  0.04
Fracture*          8.27  0.02
PO2*               9.46  0.02
PH*               12.64  0.02
PCO2*              5.01  0.02
Bicarbonate*      10.89  0.02
Creatinine*       26.66  0.01
Consciousness*    74.04  0.02
------------------------------------------------------------ 
group: Died
               vars  n   mean     sd median trimmed    mad min max range  skew
ID                1 40 395.95 250.74    363  386.72 243.89   4 921   917  0.37
Status*           2 40   2.00   0.00      2    2.00   0.00   2   2     0   NaN
Age               3 40  65.12  16.65     68   66.47  11.86  19  92    73 -0.84
Sex*              4 40   1.40   0.50      1    1.38   0.00   1   2     1  0.39
Race*             5 40   1.12   0.46      1    1.00   0.00   1   3     2  3.46
Service*          6 40   1.35   0.48      1    1.31   0.00   1   2     1  0.61
Cancer*           7 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
Renal*            8 40   1.20   0.41      1    1.12   0.00   1   2     1  1.44
Infection*        9 40   1.60   0.50      2    1.62   0.00   1   2     1 -0.39
CPR*             10 40   1.18   0.38      1    1.09   0.00   1   2     1  1.65
Systolic         11 40 118.83  41.08    126  117.22  32.62  36 256   220  0.60
HeartRate        12 40 100.62  26.49     96   99.78  25.20  55 160   105  0.28
Previous*        13 40   1.18   0.38      1    1.09   0.00   1   2     1  1.65
Type*            14 40   1.95   0.22      2    2.00   0.00   1   2     1 -3.98
Fracture*        15 40   1.07   0.27      1    1.00   0.00   1   2     1  3.11
PO2*             16 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
PH*              17 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
PCO2*            18 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
Bicarbonate*     19 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
Creatinine*      20 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
Consciousness*   21 40   1.52   0.82      1    1.41   0.00   1   3     2  1.03
               kurtosis    se
ID                -1.00 39.65
Status*             NaN  0.00
Age                0.73  2.63
Sex*              -1.89  0.08
Race*             10.72  0.07
Service*          -1.67  0.08
Cancer*            4.71  0.05
Renal*             0.09  0.06
Infection*        -1.89  0.08
CPR*               0.73  0.06
Systolic           1.40  6.50
HeartRate         -0.75  4.19
Previous*          0.73  0.06
Type*             14.16  0.03
Fracture*          7.85  0.04
PO2*               2.84  0.05
PH*                4.71  0.05
PCO2*              4.71  0.05
Bicarbonate*       2.84  0.05
Creatinine*        2.84  0.05
Consciousness*    -0.74  0.13

11.2 Age by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for Age.

> with(ICU, tapply(Age, Status, var, na.rm=TRUE))
   Lived     Died 
417.3107 277.1891 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Age ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value  Pr(>F)  
group   1   3.127 0.07855 .
      198                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p value = 0.07855 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(Age~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  Age by Status
t = -2.7151, df = 198, p-value = 0.007211
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -16.35688  -2.59312
sample estimates:
mean in group Lived  mean in group Died 
             55.650              65.125 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.007211, 0 is not within the confidence intervals of -16.35688 to -2.59312 and t = -2.7151 is less than -1.967548, we reject the null hypothesis and conclude that the means for Age for the groups Lived and Died are not the same.

11.3 HeartRate by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for HeartRate.

> with(ICU, tapply(HeartRate, Status, var, na.rm=TRUE))
   Lived     Died 
727.8491 701.8814 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(HeartRate ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value Pr(>F)
group   1   0.008  0.929
      198               

Since the p-value = 0.929 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(HeartRate~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  HeartRate by Status
t = -0.44714, df = 198, p-value = 0.6553
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.496845   7.246845
sample estimates:
mean in group Lived  mean in group Died 
             98.500             100.625 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.6553, 0 is within the confidence intervals of -11.496845 to 7.246845 and t = -0.44714 is greater than -1.967548, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for HeartRate are the same among those that lived and those that died.

11.4 Systolic by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for Systolic.

> with(ICU, tapply(Systolic, Status, var, na.rm=TRUE))
    Lived      Died 
 888.1301 1687.6353 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Systolic ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value  Pr(>F)  
group   1  4.1872 0.04205 *
      198                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value = 0.04205 for testing the homogeneity of variances is less than 0.05, we reject the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are not equal. As such, the Welch two Sample t-test is used to analyze whether there was a significant difference in means.

> t.test(Systolic~Status, alternative="two.sided", conf.level=.95, var.equal=FALSE, data=ICU)

    Welch Two Sample t-test

data:  Systolic by Status
t = 2.4341, df = 49.726, p-value = 0.01856
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  2.938642 30.698858
sample estimates:
mean in group Lived  mean in group Died 
           135.6438            118.8250 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.01856, 0 is not within the confidence intervals of 2.938642 to 30.698858 and t = 2.4341 is greater than 1.9675, we reject the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are not the same among those that lived and those that died.

11.5 Age by Sex

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Sex groups for Age.

> with(ICU, tapply(Age, Sex, var, na.rm=TRUE))
    Male   Female 
378.4780 436.5867 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Age ~ Sex, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value Pr(>F)
group   1  0.1154 0.7344
      198               

Since the p-value = 0.7344 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(Age~Sex, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  Age by Sex
t = -1.3582, df = 198, p-value = 0.1759
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.708824  1.789469
sample estimates:
  mean in group Male mean in group Female 
            56.04032             60.00000 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.1759, 0 is within the confidence intervals of -9.708824 to 1.789469 and t = -1.3582 is greater than - 1.9675, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are the same between the two groups.

12 Sajiya

13 Difference in proportions

We will conduct z-tests between two categorical variables where the dependent variable is Vital status and the independent variable is binary.

Vital status by Service

Ho: there is no difference between the proportion of patients that died in the medical versus surgical group, risk = 0.05 Ha: there is a difference between the proportion of patients that died in the medical versus surgical group

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Service, show.col.prc = TRUE)
Status Service Total
Medical Surgical
Lived 67
72 %
93
86.9 %
160
80 %
Died 26
28 %
14
13.1 %
40
20 %
Total 93
100 %
107
100 %
200
100 %
χ2=5.981 · df=1 · φ=0.185 · p=0.014
> x1<-26; x2<-14; n1<-93; n2<-107
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)  
> varp<-p*(1-p)*(1/n1 + 1/n2)  
> stdp<-sqrt(varp)  
> zp<-(p1 - p2)/stdp 
> zp
[1] 2.622729
> 1-pnorm(zp)
[1] 0.004361436

When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of 2.622729 does not fall within this confidence interval, thus we reject the null hypothesis that there is no difference between the proportion of patients dying in the ICU from medical procedures versus surgical procedures.

Further, since the p-value of 0.004361436 is less than 0.05, we, again, reject the null hypothesis that the proportions of the two groups are equal.

Lastly, we can see from the table that more of those that died were there for a medical service and more of those that lived were there for a surgical service.

Vital status by Cancer

Ho: there is no difference between the proportions of patients that died in the cancer versus non-cancer group, risk = 0.05 Ha: there is a difference between the proportions of patients that died in the cancer versus non-cancer group

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Cancer, show.col.prc = TRUE)
Status Cancer Total
No Yes
Lived 144
80 %
16
80 %
160
80 %
Died 36
20 %
4
20 %
40
20 %
Total 180
100 %
20
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000
> x1<-36; x2<-4; n1<-180; n2<-20
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)  
> varp<-p*(1-p)*(1/n1 + 1/n2)  
> stdp<-sqrt(varp)  
> zp<-(p1 - p2)/stdp 
> zp
[1] 0
> 1-pnorm(zp)
[1] 0.5

When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of 0 falls within this confidence interval, thus we fail to reject the null hypothesis that there is no difference between the proportions of patients that died with versus without cancer as part of the present problem.

Further, since the p-value of 0.5 is greater than 0.05, we, again, fail to reject the null hypothesis that the proportions of the Cancer versus non-cancer patients that died are equal.

Thus, there are equal proportions of those who died with cancer as part of the present problem and without, as well as equal proportions of those who lived with cancer as part of the present problem and without.

Vital status by Previous

Ho: there is no difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months, risk = 0.05 Ha: there is a difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Previous, show.col.prc = TRUE)
Status Previous Total
No Yes
Lived 137
80.6 %
23
76.7 %
160
80 %
Died 33
19.4 %
7
23.3 %
40
20 %
Total 170
100 %
30
100 %
200
100 %
χ2=0.061 · df=1 · φ=0.035 · Fisher’s p=0.624
> x1<-33; x2<-7; n1<-170; n2<-30
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)  
> varp<-p*(1-p)*(1/n1 + 1/n2)  
> stdp<-sqrt(varp)  
> zp<-(p1 - p2)/stdp 
> zp
[1] -0.4950738
> 1-pnorm(zp)
[1] 0.689726

When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of -0.4950738 falls within this confidence interval, thus we fail to reject the null hypothesis that there is no difference between the proportion of patients that died with versus without previous admission to an ICU within 6 months of the current admission.

Further, since the p-value of 0.689726 is greater than 0.05, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died whether or not they were previously admitted to the ICU within the past 6 months.

Lastly, we see from the table that there was a lower percentage of patients that died with no prior ICU admission than patients that died with prior ICU admission.

Vital status by Type

Ho: there is no difference between the proportion of patients that died in the elective admission group versus the emergency admission group, risk = 0.05 Ha: there is a difference between the proportion of patients that died in the elective admission group versus the emergency admission group

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Type, show.col.prc = TRUE)
Status Type Total
Elective Emergency
Lived 51
96.2 %
109
74.1 %
160
80 %
Died 2
3.8 %
38
25.9 %
40
20 %
Total 53
100 %
147
100 %
200
100 %
χ2=10.527 · df=1 · φ=0.244 · p=0.001
> x1<-2; x2<-38; n1<-53; n2<-147
> p1<-x1/n1; p2<-x2/n2
> p<-(x1+x2)/(n1+n2)  
> varp<-p*(1-p)*(1/n1 + 1/n2)  
> stdp<-sqrt(varp)  
> zp<-(p1 - p2)/stdp 
> zp
[1] -3.444743
> pnorm(zp)
[1] 0.000285801

When conducting a two-tail z-test with a 5% level of risk of a type 1 error, the critical values are -1.96 and +1.96. The z-value of -3.444743 does not fall within this confidence interval, thus we reject the null hypothesis that there is no difference between the proportion of patients that died in the elective admission group versus the emergency admission group.

Further, since the p-value of 0.000285801 is less than 0.05, we, again, reject the null hypothesis that the proportions of the two groups are equal.

We can see from the table that the emergency admission type had a higher proportion of deaths than the elective admission type did.

We will now conduct a prop.test for categorical variables where the independent variable has more than 2 subgroups.

Vital status by Age

Ho: there is no difference between the proportion of patients that died across the 5 age groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 5 age groups

Binning the Age variable with equal-count bins.

> ICU$Age.binned <- 
+   with(ICU, binVariable(Age,
+    bins=5, method='proportions', 
+   labels=c('Group 1','Group 2','Group 3',
+   'Group 4','Group 5')))
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Age.binned, show.col.prc = TRUE)
Status Age.binned Total
Group 1 Group 2 Group 3 Group 4 Group 5
Lived 38
92.7 %
31
79.5 %
34
79.1 %
33
75 %
24
72.7 %
160
80 %
Died 3
7.3 %
8
20.5 %
9
20.9 %
11
25 %
9
27.3 %
40
20 %
Total 41
100 %
39
100 %
43
100 %
44
100 %
33
100 %
200
100 %
χ2=5.930 · df=4 · Cramer’s V=0.172 · p=0.204
> Died  <- c( 3, 8, 9, 11, 9 )
> Total <- c( 41, 39, 43, 44, 33 )
> prop.test(Died, Total)

    5-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 5.93, df = 4, p-value = 0.2044
alternative hypothesis: two.sided
sample estimates:
    prop 1     prop 2     prop 3     prop 4     prop 5 
0.07317073 0.20512821 0.20930233 0.25000000 0.27272727 

Since the p-value of 0.2044 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between the proportion of patients that died across the 5 age groups. We can see from the sample estimates that the proportions are indeed quite similar between the 5 age groups. However, the number of patients that died in age group 1 (the youngest age group) is much lower than the rest. The greatest proportion of patients that died is in group 5 (the oldest age group). The proportions in groups 2 and 3 are quite similar and slightly higher in group 4. However, overall, most of the groups had similar proportions of patients that died, aside from the youngest age group.

Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 4 degrees of freedom, the critical value is 9.49. Since the chi-squared value of 5.93 is less than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died across the 5 age groups.

#binning by equal width instead

> ICU$Age.recodedbywidth <- with(ICU, binVariable(Age, bins=5, method='intervals', labels=c('1','2','3','4',
+   '5')))
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Age.recodedbywidth, show.col.prc = TRUE)
Status Age.recodedbywidth Total
1 2 3 4 5
Lived 29
93.5 %
17
89.5 %
36
78.3 %
57
76 %
21
72.4 %
160
80 %
Died 2
6.5 %
2
10.5 %
10
21.7 %
18
24 %
8
27.6 %
40
20 %
Total 31
100 %
19
100 %
46
100 %
75
100 %
29
100 %
200
100 %
χ2=6.502 · df=4 · Cramer’s V=0.180 · Fisher’s p=0.144
> Died  <- c( 2, 2, 10, 18, 8 )
> Total <- c( 31, 19, 46, 75, 29 )
> prop.test(Died, Total)

    5-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 6.5023, df = 4, p-value = 0.1646
alternative hypothesis: two.sided
sample estimates:
    prop 1     prop 2     prop 3     prop 4     prop 5 
0.06451613 0.10526316 0.21739130 0.24000000 0.27586207 

#prop1 is still lower than the other ones so no difference there but prop 2 is much lower than before (was pretty equal before so not sure what that means)

Vital status by Systolic Blood Pressure

Ho: there is no difference between the proportion of patients that died across the 6 systolic blood pressure level groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 6 systolic blood pressure level groups

> ICU <- 
+   within(ICU, {
+   Systolic.grouped <- Recode(Systolic, 
+   '0:80="hypotension"; 80:120="normal"; 120:129="elevated"; 130:139="stage 1 hypertension"; 140:180="stage 2 hypertension"; 180:260="stage 3 hypertension"; ;',
+    as.factor=TRUE)
+ })
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Systolic.grouped, show.col.prc = TRUE)
Status Systolic.grouped Total
elevated hypotension normal stage 1 hypertension stage 2 hypertension stage 3 hypertension
Lived 14
82.4 %
3
25 %
52
85.2 %
24
80 %
54
83.1 %
13
86.7 %
160
80 %
Died 3
17.6 %
9
75 %
9
14.8 %
6
20 %
11
16.9 %
2
13.3 %
40
20 %
Total 17
100 %
12
100 %
61
100 %
30
100 %
65
100 %
15
100 %
200
100 %
χ2=24.597 · df=5 · Cramer’s V=0.351 · Fisher’s p=0.002
> Died  <- c( 3, 9, 9, 6, 11, 2 )
> Total <- c( 17, 12, 61, 30, 65, 15 )
> prop.test(Died, Total)

    6-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 24.597, df = 5, p-value = 0.0001667
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4    prop 5    prop 6 
0.1764706 0.7500000 0.1475410 0.2000000 0.1692308 0.1333333 

Since the p-value of 0.0001667 is less than 0.05, we will reject the null hypothesis that there is no difference between the proportion of patients that died in the 6 systolic blood pressure level groups. We can see from the sample estimates that the proportions are indeed different between the 6 groups, with the highest proportion in the hypotenstion range and in the stage 1 hypertension range. The proportions in the elevated, normal, stage 2 hypertension and stage 3 hypertension groups are much lower and are quite close to each other. Thus, the hypotension and stage 1 hypertension groups had higher proportions of patients that died.

Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 5 degrees of freedom, the critical value is 11.07. Since the chi-squared value of 24.597 is greater than this critical value, we will, again, reject the null hypothesis that there is no difference between the proportion of patients that died in the 6 systolic blood pressure level groups.

Vital status by Heart Rate

Ho: there is no difference between the proportion of patients that died across the bradychardia, normal heart rate and elevated heart rate groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the 3 heart rate categories

> ICU <- 
+   within(ICU, {
+   HeartRate.grouped <- Recode(HeartRate, 
+   '0:60="bradycardia"; 60:100="normal"; 100:200="elevated"',
+    as.factor=TRUE)
+ })
> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$HeartRate.grouped, show.col.prc = TRUE)
Status HeartRate.grouped Total
bradycardia elevated normal
Lived 12
80 %
64
80 %
84
80 %
160
80 %
Died 3
20 %
16
20 %
21
20 %
40
20 %
Total 15
100 %
80
100 %
105
100 %
200
100 %
χ2=0.000 · df=2 · Cramer’s V=0.000 · Fisher’s p=1.000
> Died  <- c( 3, 16, 21 )
> Total <- c( 15, 80, 105 )
> prop.test(Died, Total)

    3-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 0, df = 2, p-value = 1
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 
   0.2    0.2    0.2 

Since the p-value of 1 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between proportion of patients that died across the bradychardia, normal heart rate and elevated heart rate groups. An exact p-value of 1 means that the difference in proportions of patients that died between the 3 heart rate groups is exactly 0. Thus, there were an equal number of patients that died with bradycardia, normal heart rate and elevated heart rate.

Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 0 is lower than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died having bradychardia, normal heart rate and elevated heart rate.

Vital status by Race

Ho: there is no difference between the proportion of patients that died that were White, Black or otherwise, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the race categories

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Race, show.col.prc = TRUE)
Status Race Total
White Black Other
Lived 138
78.9 %
14
93.3 %
8
80 %
160
80 %
Died 37
21.1 %
1
6.7 %
2
20 %
40
20 %
Total 175
100 %
15
100 %
10
100 %
200
100 %
χ2=1.810 · df=2 · Cramer’s V=0.095 · Fisher’s p=0.489
> Died  <- c( 37, 1, 2 )
> Total <- c( 175, 15, 10 )
> prop.test(Died, Total)

    3-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 1.8095, df = 2, p-value = 0.4046
alternative hypothesis: two.sided
sample estimates:
    prop 1     prop 2     prop 3 
0.21142857 0.06666667 0.20000000 

Since the p-value of 0.4046 is greater than 0.05, we fail to reject the null hypothesis that there is no difference between the proportion of patients that died that were White, Black or otherwise.

Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 1.8095 is less than this critical value, we, again, fail to reject the null hypothesis that there is no difference between the proportion of patients that died that were White, Black or otherwise.

When looking at the sample estimates and at the table, we see that there was almost no difference between the proportion of White patients that died and the proportion of ‘Other’ patients that died. However, the proportion of Black patients that died was slightly lower than the White and ‘Other’ groups.

Vital status by Consciousness

Ho: there is no difference between the proportion of patients that died across the Conscious, Deep Stupor and Coma groups, risk = 0.05 Ha: the proportion of patients that died is different in at least one of the Conscious, Deep Stupor and Coma groups

> pacman::p_load(sjPlot)
> sjt.xtab(ICU$Status, ICU$Consciousness, show.col.prc = TRUE)
Status Consciousness Total
Conscious Deep Stupor Coma
Lived 158
85.4 %
0
0 %
2
20 %
160
80 %
Died 27
14.6 %
5
100 %
8
80 %
40
20 %
Total 185
100 %
5
100 %
10
100 %
200
100 %
χ2=45.878 · df=2 · Cramer’s V=0.479 · Fisher’s p=0.000
> Died  <- c( 27, 5, 8 )
> Total <- c( 185, 5, 10 )
> prop.test(Died, Total)

    3-sample test for equality of proportions without continuity
    correction

data:  Died out of Total
X-squared = 45.878, df = 2, p-value = 1.091e-10
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3 
0.1459459 1.0000000 0.8000000 

Since the p-value of 1.091e-10 is less than 0.05, we will reject the null hypothesis that there is no difference between the proportion of patients that died across the Conscious, Deep Stupor and Coma groups.

Further, when conducting a chi-square test with a 5% level of risk of a type 1 error and 2 degrees of freedom, the critical value is 5.99. Since the chi-squared value of 45.878 is greater than this critical value, we will, again, reject the null hypothesis that there is no difference between the proportion of patients that died across the 3 Consciousness groups.

With a proportion of 1, the Deep Stupor group had the highest proportion of patients that died as all the patients with a deep stupor died. The Coma group has the second highest proportion of patients that died with a sample estimate of 80% dying. The lowest number of deaths was in the group of patients that had no coma or stupor, which is expected. Those with a deep stupor had a higher proportion of deaths than those in a coma.

From the tests of equal proportions, we note that the the following variables had a statistically significant impact on the Vital status of the patient: Service (type of procedure), Type (of admission) and Conciousness.

The following variables were not significant predictors of Vital status of the patient: Cancer, Previous (ICU admission within the last 6 months), Age, Systolic Blood Pressure, Heart Rate and Race.

14 Bassma

14.1 Detection of outliers of Int Variables

> par(mfrow=c(1,3))
> boxplot(ICU$Age, main="Boxplot of Age")
> boxplot(ICU$Systolic, main="Boxplot of Systolic Blood Pressure")
> boxplot(ICU$HeartRate, main="Boxplot of Heart Rate")

As we can see above, the outliers are present in systolic blood pressure and heart rate. To see the values of the outliers, please see below, where the first row includes the outliers for systolic blood pressure, and the second includes outliers for heart rate.

> systolic_outlier<-boxplot.stats(ICU$Systolic)
> heartrate_outlier<-boxplot.stats(ICU$HeartRate)
> systolic_outlier$out
[1]  48 212 224  36 256
> heartrate_outlier$out
[1] 192

14.2 Crosstabs (relations between categorical variables)

> sjt.xtab(ICU$Status, ICU$Sex, show.col.prc = TRUE)
Status Sex Total
Male Female
Lived 100
80.6 %
60
78.9 %
160
80 %
Died 24
19.4 %
16
21.1 %
40
20 %
Total 124
100 %
76
100 %
200
100 %
χ2=0.012 · df=1 · φ=0.021 · p=0.913
> sjt.xtab(ICU$Status, ICU$Race, show.col.prc = TRUE)
Status Race Total
White Black Other
Lived 138
78.9 %
14
93.3 %
8
80 %
160
80 %
Died 37
21.1 %
1
6.7 %
2
20 %
40
20 %
Total 175
100 %
15
100 %
10
100 %
200
100 %
χ2=1.810 · df=2 · Cramer’s V=0.095 · Fisher’s p=0.496
> sjt.xtab(ICU$Status, ICU$Service, show.col.prc = TRUE)
Status Service Total
Medical Surgical
Lived 67
72 %
93
86.9 %
160
80 %
Died 26
28 %
14
13.1 %
40
20 %
Total 93
100 %
107
100 %
200
100 %
χ2=5.981 · df=1 · φ=0.185 · p=0.014
> sjt.xtab(ICU$Status, ICU$Cancer, show.col.prc = TRUE)
Status Cancer Total
No Yes
Lived 144
80 %
16
80 %
160
80 %
Died 36
20 %
4
20 %
40
20 %
Total 180
100 %
20
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000
> sjt.xtab(ICU$Status, ICU$Renal, show.col.prc = TRUE)
Status Renal Total
No Yes
Lived 149
82.3 %
11
57.9 %
160
80 %
Died 32
17.7 %
8
42.1 %
40
20 %
Total 181
100 %
19
100 %
200
100 %
χ2=4.976 · df=1 · φ=0.179 · Fisher’s p=0.029
> sjt.xtab(ICU$Status, ICU$Infection, show.col.prc = TRUE)
Status Infection Total
No Yes
Lived 100
86.2 %
60
71.4 %
160
80 %
Died 16
13.8 %
24
28.6 %
40
20 %
Total 116
100 %
84
100 %
200
100 %
χ2=5.759 · df=1 · φ=0.182 · p=0.016
> sjt.xtab(ICU$Status, ICU$CPR, show.col.prc = TRUE)
Status CPR Total
No Yes
Lived 154
82.4 %
6
46.2 %
160
80 %
Died 33
17.6 %
7
53.8 %
40
20 %
Total 187
100 %
13
100 %
200
100 %
χ2=7.821 · df=1 · φ=0.223 · Fisher’s p=0.005
> sjt.xtab(ICU$Status, ICU$Previous, show.col.prc = TRUE)
Status Previous Total
No Yes
Lived 137
80.6 %
23
76.7 %
160
80 %
Died 33
19.4 %
7
23.3 %
40
20 %
Total 170
100 %
30
100 %
200
100 %
χ2=0.061 · df=1 · φ=0.035 · Fisher’s p=0.624
> sjt.xtab(ICU$Status, ICU$Type, show.col.prc = TRUE)
Status Type Total
Elective Emergency
Lived 51
96.2 %
109
74.1 %
160
80 %
Died 2
3.8 %
38
25.9 %
40
20 %
Total 53
100 %
147
100 %
200
100 %
χ2=10.527 · df=1 · φ=0.244 · p=0.001
> sjt.xtab(ICU$Status, ICU$Fracture, show.col.prc = TRUE)
Status Fracture Total
No Yes
Lived 148
80 %
12
80 %
160
80 %
Died 37
20 %
3
20 %
40
20 %
Total 185
100 %
15
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000
> sjt.xtab(ICU$Status, ICU$PO2, show.col.prc = TRUE)
Status PO2 Total
No Yes
Lived 149
81 %
11
68.8 %
160
80 %
Died 35
19 %
5
31.2 %
40
20 %
Total 184
100 %
16
100 %
200
100 %
χ2=0.718 · df=1 · φ=0.083 · Fisher’s p=0.324
> sjt.xtab(ICU$Status, ICU$PH, show.col.prc = TRUE)
Status PH Total
No Yes
Lived 151
80.7 %
9
69.2 %
160
80 %
Died 36
19.3 %
4
30.8 %
40
20 %
Total 187
100 %
13
100 %
200
100 %
χ2=0.416 · df=1 · φ=0.071 · Fisher’s p=0.297
> sjt.xtab(ICU$Status, ICU$PCO2, show.col.prc = TRUE)
Status PCO2 Total
No Yes
Lived 144
80 %
16
80 %
160
80 %
Died 36
20 %
4
20 %
40
20 %
Total 180
100 %
20
100 %
200
100 %
χ2=0.000 · df=1 · φ=0.000 · Fisher’s p=1.000
> sjt.xtab(ICU$Status, ICU$Bicarbonate, show.col.prc = TRUE)
Status Bicarbonate Total
No Yes
Lived 150
81.1 %
10
66.7 %
160
80 %
Died 35
18.9 %
5
33.3 %
40
20 %
Total 185
100 %
15
100 %
200
100 %
χ2=1.014 · df=1 · φ=0.095 · Fisher’s p=0.187
> sjt.xtab(ICU$Status, ICU$Creatinine, show.col.prc = TRUE)
Status Creatinine Total
No Yes
Lived 155
81.6 %
5
50 %
160
80 %
Died 35
18.4 %
5
50 %
40
20 %
Total 190
100 %
10
100 %
200
100 %
χ2=4.112 · df=1 · φ=0.172 · Fisher’s p=0.029
> sjt.xtab(ICU$Status, ICU$Consciousness, show.col.prc = TRUE)
Status Consciousness Total
Conscious Deep Stupor Coma
Lived 158
85.4 %
0
0 %
2
20 %
160
80 %
Died 27
14.6 %
5
100 %
8
80 %
40
20 %
Total 185
100 %
5
100 %
10
100 %
200
100 %
χ2=45.878 · df=2 · Cramer’s V=0.479 · Fisher’s p=0.000

Statistically significant crosstabs include: Creatinine, Type, CPR, Infection,Renal, Service.

14.3 Logistic Regression

Prior to conducting a logistic regression, we would like to break down the systolic blood pressure into levels. We will use clinically defined levels for hypo, normal, elevated and hyper. Ranges below 90 are considered hypo, whereas ranges from 90 to 120 are considered normal, ranges between 120 and 129 are considered elevated, and 130 plus are considered hyper.

> ICU$Systolic<-cut(ICU$Systolic, breaks=c(0,89,119,129,256))
> levels(ICU$Systolic)<-c("hypo", "normal", "elevated", "hyper")

Below is the logistic regression model with all the predictors in the dataset. As we can see, key predictors that are shown to be statistically significant are age, cancer, systolic pressure, fracture, arterial blood gas concentration, type, and consciousness. We have some confidence in our model due to the small difference between the null and residual deviance. Some of our confidence intervals have zero in them, leading us to have some doubt in our model.

> ICU.m<-glm(formula=Status~Age+Sex+Race+Service+Cancer+Renal+Infection+CPR+Systolic+HeartRate+Previous+Type+Fracture+PO2+PH+PCO2+Bicarbonate+Creatinine+Consciousness, family=binomial(logit), data=ICU)
> summary(ICU.m)

Call:
glm(formula = Status ~ Age + Sex + Race + Service + Cancer + 
    Renal + Infection + CPR + Systolic + HeartRate + Previous + 
    Type + Fracture + PO2 + PH + PCO2 + Bicarbonate + Creatinine + 
    Consciousness, family = binomial(logit), data = ICU)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.45809  -0.52365  -0.17347  -0.00019   2.92298  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
(Intercept)              -5.913e+00  2.250e+00  -2.627  0.00861 **
Age                       5.124e-02  1.829e-02   2.802  0.00508 **
SexFemale                -6.228e-01  5.563e-01  -1.119  0.26295   
RaceBlack                -1.628e+01  1.320e+03  -0.012  0.99015   
RaceOther                 4.335e-01  1.312e+00   0.330  0.74115   
ServiceSurgical          -5.792e-01  6.411e-01  -0.904  0.36625   
CancerYes                 3.157e+00  1.141e+00   2.766  0.00567 **
RenalYes                  1.605e-01  8.434e-01   0.190  0.84906   
InfectionYes             -4.125e-02  5.657e-01  -0.073  0.94186   
CPRYes                    1.347e+00  9.901e-01   1.360  0.17372   
Systolicnormal           -2.175e+00  1.044e+00  -2.084  0.03713 * 
Systolicelevated         -1.612e+00  1.114e+00  -1.447  0.14782   
Systolichyper            -2.061e+00  1.026e+00  -2.008  0.04465 * 
HeartRate                -9.748e-04  1.029e-02  -0.095  0.92455   
PreviousYes               1.013e+00  6.937e-01   1.460  0.14439   
TypeEmergency             3.396e+00  1.330e+00   2.553  0.01068 * 
FractureYes               1.566e+00  1.101e+00   1.422  0.15506   
PO2Yes                   -7.044e-01  9.923e-01  -0.710  0.47781   
PHYes                     1.526e+00  1.229e+00   1.241  0.21463   
PCO2Yes                  -2.447e+00  1.235e+00  -1.981  0.04756 * 
BicarbonateYes           -8.800e-03  9.056e-01  -0.010  0.99225   
CreatinineYes            -2.035e-01  1.139e+00  -0.179  0.85827   
ConsciousnessDeep Stupor  3.538e+01  2.568e+03   0.014  0.98901   
ConsciousnessComa         3.373e+00  1.359e+00   2.482  0.01307 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 200.16  on 199  degrees of freedom
Residual deviance: 112.67  on 176  degrees of freedom
AIC: 160.67

Number of Fisher Scoring iterations: 17
> confint(ICU.m)
                                 2.5 %      97.5 %
(Intercept)               -10.71446902 -1.69601255
Age                         0.01835185  0.09076391
SexFemale                  -1.76222223  0.44150426
RaceBlack                -406.17117888 48.28576524
RaceOther                  -2.78151725  2.76940774
ServiceSurgical            -1.87453755  0.66822713
CancerYes                   1.04815774  5.66235244
RenalYes                   -1.58805858  1.78878733
InfectionYes               -1.17641537  1.06336175
CPRYes                     -0.63018852  3.34518326
Systolicnormal             -4.37115591 -0.21283200
Systolicelevated           -3.96121708  0.47666682
Systolichyper              -4.25385307 -0.16344121
HeartRate                  -0.02182989  0.01893995
PreviousYes                -0.38940841  2.37091562
TypeEmergency               1.20038241  6.70600040
FractureYes                -0.77450551  3.72610086
PO2Yes                     -2.80364875  1.17162344
PHYes                      -0.83446254  4.05350727
PCO2Yes                    -5.12715251 -0.23880882
BicarbonateYes             -1.88096203  1.74562784
CreatinineYes              -2.55264347  1.98603281
ConsciousnessDeep Stupor -229.97748940          NA
ConsciousnessComa           0.93736975  6.35058207
> exp(coef(ICU.m))
             (Intercept)                      Age                SexFemale 
            2.704937e-03             1.052574e+00             5.364495e-01 
               RaceBlack                RaceOther          ServiceSurgical 
            8.463168e-08             1.542613e+00             5.603338e-01 
               CancerYes                 RenalYes             InfectionYes 
            2.350715e+01             1.174101e+00             9.595848e-01 
                  CPRYes           Systolicnormal         Systolicelevated 
            3.845560e+00             1.135578e-01             1.995610e-01 
           Systolichyper                HeartRate              PreviousYes 
            1.273357e-01             9.990257e-01             2.752703e+00 
           TypeEmergency              FractureYes                   PO2Yes 
            2.983051e+01             4.788341e+00             4.944214e-01 
                   PHYes                  PCO2Yes           BicarbonateYes 
            4.598443e+00             8.656750e-02             9.912381e-01 
           CreatinineYes ConsciousnessDeep Stupor        ConsciousnessComa 
            8.159074e-01             2.322342e+15             2.915368e+01 

14.3.1 Final model

> ICU.final.m<-glm(formula=Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit),data=ICU)
> summary(ICU.final.m)

Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = ICU)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9242  -0.5616  -0.3078  -0.1039   2.5967  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
(Intercept)                -5.46992    1.78152  -3.070  0.00214 **
Age                         0.03715    0.01318   2.819  0.00481 **
CancerYes                   2.55498    1.06544   2.398  0.01648 * 
Systolicnormal             -2.17702    0.87730  -2.481  0.01308 * 
Systolicelevated           -1.71982    0.94888  -1.812  0.06991 . 
Systolichyper              -2.02455    0.80102  -2.527  0.01149 * 
PCO2Yes                    -1.46173    0.86529  -1.689  0.09116 . 
ConsciousnessDeep Stupor   20.75726 1561.13214   0.013  0.98939   
ConsciousnessComa           2.76952    0.97924   2.828  0.00468 **
TypeEmergency               3.65070    1.28917   2.832  0.00463 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 200.16  on 199  degrees of freedom
Residual deviance: 125.60  on 190  degrees of freedom
AIC: 145.6

Number of Fisher Scoring iterations: 16
> confint(ICU.final.m)
                                 2.5 %      97.5 %
(Intercept)                -9.46575467 -2.26757094
Age                         0.01291117  0.06506014
CancerYes                   0.59112234  4.95935713
Systolicnormal             -4.01598420 -0.51648433
Systolicelevated           -3.71304718  0.07016753
Systolichyper              -3.71337117 -0.50242069
PCO2Yes                    -3.36824530  0.08432436
ConsciousnessDeep Stupor -140.94294699          NA
ConsciousnessComa           0.99821470  4.98622776
TypeEmergency               1.60812369  6.92583339
> exp(coef(ICU.final.m))
             (Intercept)                      Age                CancerYes 
            4.211568e-03             1.037849e+00             1.287104e+01 
          Systolicnormal         Systolicelevated            Systolichyper 
            1.133791e-01             1.790977e-01             1.320539e-01 
                 PCO2Yes ConsciousnessDeep Stupor        ConsciousnessComa 
            2.318346e-01             1.034580e+09             1.595094e+01 
           TypeEmergency 
            3.850154e+01 
> pacman::p_load(rsample)
> set.seed(78)
> train_test_split <- initial_split(ICU)
> train <- training(train_test_split)
> test <- testing(train_test_split)
> (samp <- dim(train_test_split))
  analysis assessment          n          p 
       150         50        200         25 
> ICU.tr<-glm(Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit),data=train)
> summary(ICU.tr)

Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9703  -0.5116  -0.2826  -0.1133   2.5197  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
(Intercept)                -5.04848    1.93964  -2.603  0.00925 **
Age                         0.03639    0.01569   2.319  0.02042 * 
CancerYes                   2.56538    1.09773   2.337  0.01944 * 
Systolicnormal             -2.44624    1.06730  -2.292  0.02191 * 
Systolicelevated           -2.27752    1.23500  -1.844  0.06516 . 
Systolichyper              -1.99802    0.96884  -2.062  0.03918 * 
PCO2Yes                    -1.38288    0.95161  -1.453  0.14617   
ConsciousnessDeep Stupor   19.82640 1234.36430   0.016  0.98718   
ConsciousnessComa           2.70147    1.01560   2.660  0.00781 **
TypeEmergency               3.36914    1.28536   2.621  0.00876 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 144.406  on 149  degrees of freedom
Residual deviance:  89.393  on 140  degrees of freedom
AIC: 109.39

Number of Fisher Scoring iterations: 15

14.3.2 Testing & training the model

> predicted.val<-predict(ICU.tr, newdata=test)
> head(predicted.val)
        1         4        13        16        19        20 
-2.694963 -1.712560 -4.645074 -5.154468 -1.385092 -4.706261 
> predicted.prob <- predict(ICU.tr, test, type = "response")
> head( predict( ICU.tr, test, type="response") )
          1           4          13          16          19          20 
0.063271249 0.152831961 0.009517366 0.005740407 0.200192370 0.008957547 
> predicted.classes <- ifelse( predicted.prob > 0.5, "Lived", "Dead" )
> head(predicted.classes)
     1      4     13     16     19     20 
"Dead" "Dead" "Dead" "Dead" "Dead" "Dead" 
> pacman::p_load(car)
> car::vif(ICU.tr)
                  GVIF Df GVIF^(1/(2*Df))
Age           1.088868  1        1.043488
Cancer        1.541005  1        1.241372
Systolic      1.351654  3        1.051504
PCO2          1.430087  1        1.195862
Consciousness 1.162778  2        1.038423
Type          1.439780  1        1.199909
> fit0<-glm(Status~1, family=binomial(logit), data=train)
> summary(fit0)

Call:
glm(formula = Status ~ 1, family = binomial(logit), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6428  -0.6428  -0.6428  -0.6428   1.8322  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.4718     0.2095  -7.024 2.16e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 144.41  on 149  degrees of freedom
Residual deviance: 144.41  on 149  degrees of freedom
AIC: 146.41

Number of Fisher Scoring iterations: 4
> fitall<-glm(Status~Age+Cancer+Systolic+PCO2+Consciousness+Type, family=binomial(logit), data=train)
> summary(fitall)

Call:
glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9703  -0.5116  -0.2826  -0.1133   2.5197  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
(Intercept)                -5.04848    1.93964  -2.603  0.00925 **
Age                         0.03639    0.01569   2.319  0.02042 * 
CancerYes                   2.56538    1.09773   2.337  0.01944 * 
Systolicnormal             -2.44624    1.06730  -2.292  0.02191 * 
Systolicelevated           -2.27752    1.23500  -1.844  0.06516 . 
Systolichyper              -1.99802    0.96884  -2.062  0.03918 * 
PCO2Yes                    -1.38288    0.95161  -1.453  0.14617   
ConsciousnessDeep Stupor   19.82640 1234.36430   0.016  0.98718   
ConsciousnessComa           2.70147    1.01560   2.660  0.00781 **
TypeEmergency               3.36914    1.28536   2.621  0.00876 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 144.406  on 149  degrees of freedom
Residual deviance:  89.393  on 140  degrees of freedom
AIC: 109.39

Number of Fisher Scoring iterations: 15
> anova(fitall, fit0, test="Chisq")
Analysis of Deviance Table

Model 1: Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + Type
Model 2: Status ~ 1
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       140     89.393                          
2       149    144.406 -9  -55.013 1.211e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> pacman::p_load(survey)
> regTermTest(ICU.tr, "Age")
Wald test for Age
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  5.375635  on  1  and  140  df: p= 0.021868 
> regTermTest(ICU.tr, "Cancer")
Wald test for Cancer
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  5.461517  on  1  and  140  df: p= 0.020858 
> regTermTest(ICU.tr, "Systolic")
Wald test for Systolic
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  1.890836  on  3  and  140  df: p= 0.13391 
> regTermTest(ICU.tr, "Type")
Wald test for Type
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  6.870492  on  1  and  140  df: p= 0.0097314 
> regTermTest(ICU.tr, "PCO2")
Wald test for PCO2
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  2.111782  on  1  and  140  df: p= 0.14841 
> regTermTest(ICU.tr, "Consciousness")
Wald test for Consciousness
 in glm(formula = Status ~ Age + Cancer + Systolic + PCO2 + Consciousness + 
    Type, family = binomial(logit), data = train)
F =  3.537891  on  2  and  140  df: p= 0.031702 

14.4 Classification and Regression Trees

> pacman::p_load(rpart, rpart.plot, rattle, dplyr)
> set.seed(2715)
> 
> #final model 
> icutree=rpart(Status~Age+Cancer+Systolic+Type+PCO2+Consciousness+Type+Fracture+Sex+Race+Service+Renal+Infection+CPR+HeartRate+Previous+PO2+PH+Bicarbonate+Creatinine+Consciousness, method="class", data=ICU)
> #rpart.plot(icutree,
>            #extra = 104, # show fitted class, probs, percentages
>            #box.palette = "GnYlRd", # color scheme
>            #prefix = "Status\n",
>            #branch.lty = 1, # solid branch lines, 3 = dotted
>            #shadow.col = "gray", # shadows under the node boxes
>            #split.prefix = "is ", # put "is " before split text
>            #split.suffix = "?", # put "?" after split text
>            #nn = TRUE, # display the node numbers
>            #tweak = 1.2)
> fancyRpartPlot(icutree)

> printcp(icutree)

Classification tree:
rpart(formula = Status ~ Age + Cancer + Systolic + Type + PCO2 + 
    Consciousness + Type + Fracture + Sex + Race + Service + 
    Renal + Infection + CPR + HeartRate + Previous + PO2 + PH + 
    Bicarbonate + Creatinine + Consciousness, data = ICU, method = "class")

Variables actually used in tree construction:
[1] Consciousness Systolic     

Root node error: 40/200 = 0.2

n= 200 

     CP nsplit rel error xerror    xstd
1 0.275      0     1.000  1.000 0.14142
2 0.075      1     0.725  0.725 0.12449
3 0.010      2     0.650  0.725 0.12449
> predicted.classes <- icutree %>% predict(, data=icu, type = "class")
> head(predicted.classes, 12)
    1     2     3     4     5     6     7     8     9    10    11    12 
Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived Lived 
Levels: Lived Died
> mean(predicted.classes == ICU$Status)
[1] 0.87
> plotcp(icutree)

 

A work by Ekene Olatunji, Caterine Nassralla, Victoria Chin, Bassam Chamas, and Sajiya Somji

Msc. eHealth

DeGroote School of Business

eHealth 705 Statistics for eHealth - Final Project